108 research outputs found

    Characterizing Networking as Experienced by Users (White Paper)

    Get PDF

    Demonstrating 100 Gbps in and out of the public Clouds

    Full text link
    There is increased awareness and recognition that public Cloud providers do provide capabilities not found elsewhere, with elasticity being a major driver. The value of elastic scaling is however tightly coupled to the capabilities of the networks that connect all involved resources, both in the public Clouds and at the various research institutions. This paper presents results of measurements involving file transfers inside public Cloud providers, fetching data from on-prem resources into public Cloud instances and fetching data from public Cloud storage into on-prem nodes. The networking of the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform, has been benchmarked. The on-prem nodes were managed by either the Pacific Research Platform or located at the University of Wisconsin - Madison. The observed sustained throughput was of the order of 100 Gbps in all the tests moving data in and out of the public Clouds and throughput reaching into the Tbps range for data movements inside the public Cloud providers themselves. All the tests used HTTP as the transfer protocol.Comment: 4 pages, 6 figures, 3 table

    Defining a canonical unit for accounting purposes

    Full text link
    Compute resource providers often put in place batch compute systems to maximize the utilization of such resources. However, compute nodes in such clusters, both physical and logical, contain several complementary resources, with notable examples being CPUs, GPUs, memory and ephemeral storage. User jobs will typically require more than one such resource, resulting in co-scheduling trade-offs of partial nodes, especially in multi-user environments. When accounting for either user billing or scheduling overhead, it is thus important to consider all such resources together. We thus define the concept of a threshold-based "canonical unit" that combines several resource types into a single discrete unit and use it to characterize scheduling overhead and make resource billing more fair for both resource providers and users. Note that the exact definition of a canonical unit is not prescribed and may change between resource providers. Nevertheless, we provide a template and two example definitions that we consider appropriate in the context of the Open Science Grid.Comment: 6 pages, 2 figures, To be published in proceedings of PEARC2

    glideinWMS - A generic pilot-based Workload Management System

    Get PDF
    The Grid resources are distributed among hundreds of independent Grid sites, requiring a higher level Workload Management System (WMS) to be used efficiently. Pilot jobs have been used for this purpose by many communities, bringing increased reliability, global fair share and just in time resource matching. GlideinWMS is a WMS based on the Condor glidein concept, i.e. a regular Condor pool, with the Condor daemons (startds) being started by pilot jobs, and real jobs being vanilla, standard or MPI universe jobs. The glideinWMS is composed of a set of Glidein Factories, handling the submission of pilot jobs to a set of Grid sites, and a set of VO Frontends, requesting pilot submission based on the status of user jobs. This paper contains the structural overview of glideinWMS as well as a detailed description of the current implementation and the current scalability limits

    Porting and optimizing UniFrac for GPUs

    Full text link
    UniFrac is a commonly used metric in microbiome research for comparing microbiome profiles to one another ("beta diversity"). The recently implemented Striped UniFrac added the capability to split the problem into many independent subproblems and exhibits near linear scaling. In this paper we describe steps undertaken in porting and optimizing Striped Unifrac to GPUs. We reduced the run time of computing UniFrac on the published Earth Microbiome Project dataset from 13 hours on an Intel Xeon E5-2680 v4 CPU to 12 minutes on an NVIDIA Tesla V100 GPU, and to about one hour on a laptop with NVIDIA GTX 1050 (with minor loss in precision). Computing UniFrac on a larger dataset containing 113k samples reduced the run time from over one month on the CPU to less than 2 hours on the V100 and 9 hours on an NVIDIA RTX 2080TI GPU (with minor loss in precision). This was achieved by using OpenACC for generating the GPU offload code and by improving the memory access patterns. A BSD-licensed implementation is available, which produces a C shared library linkable by any programming language.Comment: 4 pages, 3 figures, 4 table

    Characterizing network paths in and out of the clouds

    Full text link
    Commercial Cloud computing is becoming mainstream, with funding agencies moving beyond prototyping and starting to fund production campaigns, too. An important aspect of any scientific computing production campaign is data movement, both incoming and outgoing. And while the performance and cost of VMs is relatively well understood, the network performance and cost is not. This paper provides a characterization of networking in various regions of Amazon Web Services, Microsoft Azure and Google Cloud Platform, both between Cloud resources and major DTNs in the Pacific Research Platform, including OSG data federation caches in the network backbone, and inside the clouds themselves. The paper contains both a qualitative analysis of the results as well as latency and throughput measurements. It also includes an analysis of the costs involved with Cloud-based networking.Comment: 7 pages, 1 figure, 5 tables, to be published in CHEP19 proceeding

    Running a Pre-Exascale, Geographically Distributed, Multi-Cloud Scientific Simulation

    Full text link
    As we approach the Exascale era, it is important to verify that the existing frameworks and tools will still work at that scale. Moreover, public Cloud computing has been emerging as a viable solution for both prototyping and urgent computing. Using the elasticity of the Cloud, we have thus put in place a pre-exascale HTCondor setup for running a scientific simulation in the Cloud, with the chosen application being IceCube's photon propagation simulation. I.e. this was not a purely demonstration run, but it was also used to produce valuable and much needed scientific results for the IceCube collaboration. In order to reach the desired scale, we aggregated GPU resources across 8 GPU models from many geographic regions across Amazon Web Services, Microsoft Azure, and the Google Cloud Platform. Using this setup, we reached a peak of over 51k GPUs corresponding to almost 380 PFLOP32s, for a total integrated compute of about 100k GPU hours. In this paper we provide the description of the setup, the problems that were discovered and overcome, as well as a short description of the actual science output of the exercise.Comment: 18 pages, 5 figures, 4 tables, to be published in Proceedings of ISC High Performance 202

    Testing GitHub projects on custom resources using unprivileged Kubernetes runners

    Full text link
    GitHub is a popular repository for hosting software projects, both due to ease of use and the seamless integration with its testing environment. Native GitHub Actions make it easy for software developers to validate new commits and have confidence that new code does not introduce major bugs. The freely available test environments are limited to only a few popular setups but can be extended with custom Action Runners. Our team had access to a Kubernetes cluster with GPU accelerators, so we explored the feasibility of automatically deploying GPU-providing runners there. All available Kubernetes-based setups, however, require cluster-admin level privileges. To address this problem, we developed a simple custom setup that operates in a completely unprivileged manner. In this paper we provide a summary description of the setup and our experience using it in the context of two Knight lab projects on the Prototype National Research Platform system.Comment: 5 pages, 1 figure, To be published in proceedings of PEARC2

    Microarchitecture: A useful tool to organize machines in heterogeneous shared computing environments

    Get PDF
    The x86_64 instruction set architecture is not a single, consistent, compatible interface to execute computer programs. Since the initial release in 1999, every new generation has added new instructions, some of which were later removed. Most of these new instructions are intended to improve the performance of those programs which explicitly take advantage of them. However, running such a program on older CPUs without appropriate support, results in Linux SIGILL exception signal, which is difficult for end users to diagnose. On the other hand, compiling scientific code for the least common denominator ISA can leave significant performance on the table. High Throughput systems, containing very large number of machines, cannot require a single CPU version across hundreds of thousands of machines operating in dozens of sites. The OSG Open Science Pool alone consists of more than 20 different, subtly incompatible X86_64 implementations. In 2020, Intel, AMD and RedHat proposed new terminology and partitioned these dozens of microarchitectures into a strict hierarchy of four groups. The HTCondor Software Suite and the OSG now have first class support for these microarchitectures. This paper discusses the advantages for users and future work around microarchitecture support